Selecting the Best Tridiagonal System Solver Projected on Multi-Core CPU and GPU Platforms
نویسندگان
چکیده
Nowadays multicore processors and graphics cards are commodity hardware that can be found in personal computers. Both CPU and GPU are capable of performing high-end computations. In this paper we present and compare parallel implementations of two tridiagonal system solvers. We analyze the cyclic reduction method, as an example of fine-grained parallelism, and Bondeli’s algorithm, as a coarse-grained example of parallelism. Both algorithms are implemented for GPU architectures using CUDA and multi-core CPU with shared memory architectures using OpenMP. The results are compared in terms of execution time, speedup, and GFLOPS. For a large system of equations, 2, the best results were obtained for Bondeli’s algorithm (speedup 1.55x and 0.84 GFLOPS) for multi-core CPU platforms while the cyclic reduction (speedup 17.06x and 5.09 GFLOPS) was the best for the case of GPU platforms.
منابع مشابه
Performance Evaluation and Analysis for Conjugate Gradient Solver on Heterogeneous (Multi-GPUs/Multi-CPUs) platforms
High performance computing (HPC) presents a technology that allows solving high intensive problems in a reasonable period of time, and can offer many advantages for large applications in various fields of science and industry. Current multi-core processors, especially graphic processing units (GPUs), have quickly evolved to become efficient accelerators for data parallel computing. They can mai...
متن کاملGPGPU parallel algorithms for structured-grid CFD codes
A new high-performance general-purpose graphics processing unit (GPGPU) computational fluid dynamics (CFD) library is introduced for use with structured-grid CFD algorithms. A novel set of parallel tridiagonal matrix solvers, implemented in CUDA, is included for use with structured-grid CFD algorithms. The solver library supports both scalar and block-tridiagonal matrices suitable for approxima...
متن کاملEfficient heterogeneous execution on large multicore and accelerator platforms: Case study using a block tridiagonal solver
The algorithmic and implementation principles are explored in gainfully exploiting GPU accelerators in conjunction with multicore processors on high-end systems with large numbers of compute nodes, and evaluated in an implementation of a scalable block tridiagonal solver. The accelerator of each compute node is exploited in combination with multicore processors of that node in performing block-...
متن کاملThe Comparison of Parallel Sorting Algorithms Implemented on Different Hardware Platforms
Sorting is a common problem in computer science. There are a lot of wellknown sorting algorithms created for sequential execution on a single processor. Recently, many-core and multi-core platforms have enabled the creation of wide parallel algorithms. We have standard processors that consist of multiple cores and hardware accelerators, like the GPU. Graphic cards, with their parallel architect...
متن کاملA Memory-Efficient Algorithm for Large-Scale Symmetric Tridiagonal Eigenvalue Problem on Multi-GPU Systems
Divide-and-conquer algorithm is a numerically stable and efficient algorithm that computes the eigenvalues and eigenvectors of a symmetric tridiagonal matrix. We often face the situation where the input matrix fits into the main memory but not into the on-chip memory of a GPU device. We present an out-of-core implementation where only part of the input matrix is resident in GPU memory at any po...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011